Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation

نویسندگان

  • Gregory Shakhnarovich
  • Ran El-Yaniv
  • Yoram Baram
چکیده

This paper is concerned with the estimation of a classifier’s accuracy. We present a number of novel bootstrap estimators, based on kernel smoothing, that consistently show superior performance on both synthetic and real data, with respect to other established methods. We call the process of (re)sampling the data via kernel-based smoothed bootstrap data cloning. The new cloning methods outperform cross-validation and the .632+ bootstrap, which, according to Efron and Tibshirani, is the estimator of choice. Finally, we extend our estimators to complex real-life data sets, in which a data point might include real, bounded, integer and nominal attributes, thus allowing for better classifier evaluation over limited real data repositories such as the UCI repository.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Data Cloning for Machine Learning Research Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science

This work is concerned with the estimation of a classifier’s accuracy. We first review some existing methods for error estimation, focusing on cross-validation and bootstrap, and motivate the use of kernel-based smoothing for small sample size. We use the term data cloning to refer to the process of (re)sampling the data via kernel-based smoothed bootstrap. A number of novel estimators based on...

متن کامل

Iterated Smoothed Bootstrap Confidence Intervals for Population Quantiles

This paper investigates the effects of smoothed bootstrap iterations on coverage probabilities of smoothed bootstrap and bootstrap-t confidence intervals for population quantiles, and establishes the optimal kernel bandwidths at various stages of the smoothing procedures. The conventional smoothed bootstrap and bootstrap-t methods have been known to yield one-sided coverage errors of orders O(n...

متن کامل

ROSE: A Package for Binary Imbalanced Learning

Abstract The ROSE package provides functions to deal with binary classification problems in the presence of imbalanced classes. Artificial balanced samples are generated according to a smoothed bootstrap approach and allow for aiding both the phases of estimation and accuracy evaluation of a binary classifier in the presence of a rare class. Functions that implement more traditional remedies fo...

متن کامل

Smoothed Bootstrap and Jackboot Sampling Smoothed Bootstrap and Jackboot Sampling

We propose a bootstrap sampling method jackboot sampling This provides more accu rate inferences than ordinary bootstrap sampling better con dence interval coverage and less biased or unbiased standard errors The method is simple to implement We also prescribe a smoothing parameter for use in smoothed bootstrapping using or dinary kernel smoothing The e ect is similar to that of jackboot sampling

متن کامل

On constructing accurate confidence bands for ROC curves through smooth resampling

This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics s(X) in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the ”smoothed bootstrap” is introduced. Theoretical a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001